Modulation Spectrum Analysis for Recognition of Reverberant Speech
نویسندگان
چکیده
Recognition of reverberant speech constitutes a challenging problem for typical speech recognition systems. This is mainly due to the conventional short-term analysis/compensation techniques. In this paper, we present a feature extraction technique based on modeling long segments of temporal envelopes of the speech signal in narrow sub-bands using frequency domain linear prediction (FDLP). FDLP provides an all-pole approximation of the Hilbert envelope of the signal by linear prediction on cosine transform of the signal. We show that the FDLP modulation spectrum plays an important role in the robustness of the proposed feature extraction. Automatic speech recognition (ASR) experiments on speech data degraded with a number of room impulse responses (with varying degrees of distortion) show significant performance improvements for the proposed FDLP features when compared to other robust feature extraction techniques (average relative reduction of 40% in word error rate). Similar improvements are also obtained for far-field data which contain natural reverberation in background noise.
منابع مشابه
Subband temporal modulation spectrum normalization for automatic speech recognition in reverberant environments
Speech recognition in reverberant environments is still a challenge problem. In this paper, we first investigated the reverberation effect on subband temporal envelopes by using the modulation transfer function (MTF). Based on the investigation, we proposed an algorithm which normalizes the subband temporal modulation spectrum (TMS) to reduce the diffusion effect of the reverberation. During th...
متن کاملAn MTF-based blind restoration of temporal power envelopes as a front-end processor for automatic speech recognition systems in reverberant environments
To reduce speech degradation in reverberant environments, we previously proposed a modulation transfer function (MTF) based method of speech restoration. The room impulse response (RIR) in this restoration does not need to be measured at any time since we modeled the power envelope of the RIRs as an exponential decay function. Speech is assumed to be temporal modulated with white noise carrier ...
متن کاملRobust front end processing for speech recognition in reverberant environments: utilization of speech characteristics
This paper proposes two methods for robust automatic speech recognition (ASR) in reverberant environments. Unlike other methods which mostly apply inverse filtering by blindly estimated room impulse responses to achieve dereverberation, the proposed methods are based on the utilization of the characteristics of speech. The first method Harmonicity based Feature Analysis – takes advantage of the...
متن کاملA robust feature extraction based on the MT reverberant envir
This paper proposes a robust feature extraction method for automatic speech recognition (ASR) systems in reverberant environment. In this method, a sub-band power envelope inverse filtering algorithm based on the modulation transfer function (MTF), that we have previously proposed, is incorporated as a front-end processor for ASR. The impulse response of the room acoustics is assumed to be expo...
متن کاملRobust Asr in Reverberant Environments Using Temporal Cepstrum Smoothing for Speech Enhancement and an Amplitude Modulation Filterbank for Feature Extraction
This paper presents techniques aiming at improving automatic speech recognition (ASR) in single channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. System improvements range from speech enhancement over robust feature extraction to model adaptation and word-based integration of multiple classifiers. The selective temporal cepstrum ...
متن کامل